This notebook explores the Opération Fourmis public inventory data and summarises it across a few different potential variables of interest.
The coordinates associated with the public inventory data are probably approximately accurate overall. My concern is that even relatively minor inaccuracies in the coordinates will cause a lot of misalignment between the local GIS layers and where the ants were actually collected. Even with ideal location recording methods and conditions, the GPS on devices like smartphones typically have a precision of about ± 5m. Before assigning habitat or land use categories to the collections, we need to know how reliable that is as a method, and constrain the questions and analyses accordingly.
GEOPRECISION: Coordinate sourceOne concern with extracting local variables like habitat, land use, distance from the nearest road, or distance from the nearest building is that that requires a lot of confidence in the latitude and longitude associated with the point locations. The column GEOPRECISION indicates whether the location was extrapolated, corrected, or measured (or some combination).
| GEOPRECISION | Tubes | Percent | Percent (non-NA) |
|---|---|---|---|
| mesuré | 6159 | 89.5% | 89.6% |
| extrapolé | 624 | 9.1% | 9.1% |
| extrapolé/corrigé | 44 | 0.6% | 0.6% |
| extrapolé mauvais | 17 | 0.2% | 0.2% |
| NA | 12 | 0.2% | - |
| mesuré/corrigé | 11 | 0.2% | 0.2% |
| extrapolé (base tube précédent) | 6 | 0.1% | 0.1% |
| extrapolé (église par défaut) | 5 | 0.1% | 0.1% |
| extrapolé (gare par défaut) | 4 | 0.1% | 0.1% |
| extrapolé/corrigé (église par défaut) | 1 | 0.0% | 0.0% |
The coordinates were mostly measured directly by the collector, and only a small proportion were extrapolated badly. In theory, we could assume that mesuré, extrapolé, extrapolé/corrigé, and mesuré/corrigé indicate that the coordinates can be used directly.
The number of reported digits is an estimate of precision for coordinates reported in decimal degrees, but not for the Swiss coordinate system which reports 6 digits no matter what. For latitude and longitude at the equator, an arc-degree corresponds with about 111km. At a longitude of 46ºN, an arc-degree is 76.5km.
| Decimals | Precision (Lat.) | Precision (Lon.) |
|---|---|---|
| 1 | ± 5500 m | ± 3825 m |
| 2 | ± 555 m | ± 383 m |
| 3 | ± 55.5 m | ± 38.3 m |
| 4 | ± 5.55 m | ± 3.83 m |
| 5 | ± 0.555 m | ± 0.383 m |
| 6 | ± 0.0555 m | ± 0.0383 m |
The reported digits can be used to set a minimum bound if, e.g., only 2 digits are reported, but typically devices will report many digits even if they are not justified. There were 3945 tubes (57.3%) reporting the coordinates in decimal degrees, with the rest using the 6-digit Swiss coordinates and no estimate of precision. The decimal degree coordinates include 686 tubes with coordinates extrapolated based on the reported locality. The reliability of the extrapolated coordinates for extracting local variables like habitat or land use type rely on a clear description of the habitat by the collector.
| Decimals | Tubes | Percent |
|---|---|---|
| 1 | 1 | 0.0% |
| 2 | 52 | 1.3% |
| 3 | 165 | 4.2% |
| 4 | 670 | 17.0% |
| 5 | 669 | 17.0% |
| 6 | 1226 | 31.1% |
| 7 | 229 | 5.8% |
| 8 | 933 | 23.7% |
Typically, smartphones are accurate under good conditions to about 5m in radius, with worse performance around buildings, bridges, trees, etc. It therefore seems likely that coordinates with >5 decimal places are overestimating precision. More importantly, the 5.5% of locations with fewer than 4 should not be taken as-is with a high degree of confidence. Again, this metric isn’t possible with the locations recorded with the Swiss coordinate system (2938 tubes: 43%), but it seems reasonable that the distribution of precision would be roughly similar.
For extracting local conditions based on point locations, it seems reasonable to buffer all points with 5-10m, with the local habitat or land use type assigned as the dominant category within the buffer. The buffer should not affect distance to nearest road, aside from reducing most distances by a uniform amount and reducing points with distances less than the buffer radius to 0m.
It is also a good idea to remove tubes with GEOPRECISION == "extrapolé mauvais" and possibly "extrapolé (base tube précédent)", "extrapolé (église par défaut)", "extrapolé (gare par défaut)", "extrapolé/corrigé (église par défaut)" as the uncertainty seems likely to be greater than 5-10m. Lastly, tubes with fewer than 3 decimals for the lat/lon coordinates should also be removed for the same reasons.
geo_exclude <- c("extrapolé mauvais",
"extrapolé (base tube précédent)",
"extrapolé (église par défaut)",
"extrapolé (gare par défaut)",
"extrapolé/corrigé (église par défaut)")
pub_filt <- ant$pub %>%
filter(!is.na(GEOPRECISION)) %>%
filter(!GEOPRECISION %in% geo_exclude) %>%
filter(is.na(LATITUDE) | nchar(LATITUDE) > 5) %>% # Swiss coords | Lat decimals
filter(is.na(LONGITUDE) | nchar(LONGITUDE) > 4) # Swiss coords | Lon decimals
pub.5m <- pub_filt %>% st_buffer(dist=5)
pub.10m <- pub_filt %>% st_buffer(dist=10)
There are three land cover / land use datasets available:
- Habitat layer created for the structured sampling
- CORINE Land Cover, which has a broader legend and is consistent across Europe, but uses a minimum mapping unit of 25 ha (500mx500m)
- Land use for largely agricultural land in Vaud in 2019
CORINE and the Opération Fourmis dataset have full coverage across Vaud, while the detailed land use dataset is mostly restricted to open canopy areas in the lower elevations (OpFo, CORINE, VD).
Here is a random area within Vaud showing the differences. The grid is 1km x 1km, with the public inventory tubes shown as the small black points (with 5m and 10m buffers), and building footprints from open street maps. (OpFo, CORINE, VD).